Genetic diversity of Prunus armeniaca L. var. ansu Maxim. germplasm revealed by simple sequence repeat (SSR) markers

The genetic diversity and genetic structure of P. armeniaca var. ansu were analyzed based on SSR markers. The aim was to provide scientific basis for conservation, efficient utilization, molecular marker assisted breeding and improved variety selection of P. armeniaca var. ansu germplasm resources. The results showed that the level of genetic diversity within the population was high. Among the 30 SSR markers, the mean number of observed alleles was 11.433, the mean number of effective alleles was 4.433, the mean of Shannon information index was 1.670, and the mean of polymorphic information content was 0.670. Among the eight provenances, Tuanjie Township, Xinyuan County, Xinjiang had the highest genetic diversity. The observed alleles, effective alleles, Shannon information index and Nei’s gene diversity index among provenances were higher than those within provenances. Based on Bayesian mathematical modeling and UPGMA cluster analysis, 86 P. armeniaca var. ansu accessions were divided into three subpopulations and four groups, which reflected individual differences in provenances. Subpopulations classified by Bayesian mathematical modeling and groups classified by UPGMA cluster analysis were significantly correlated with geographical provenance (Sig<0.01) and the provenances significantly impacted classification of groups. The provenances played an important role in classification of groups. The genetic distance between Tuanjie Township of Xinyuan County and Alemale Township of Xinyuan County was the smallest, while the genetic relationship between them was the closest and the degree of genetic differentiation was small.


Introduction
Prunus armeniaca L. refers to both the wild progenitor and the cultivated species, which belongs to the family Rosaceae [1]. This species is an important stone fruit that is widely grown in the temperate regions of the world, and has an annual worldwide production of~4.1 million tons (FAO, 2019). It is native to the Yellow River Basin in China, widely distributed across the mid-temperate zone and warm temperate zone of China, and covering most of the and utilization of this species. In this study, the genetic structure and diversity of 86 P. armeniaca var. ansu accessions were analyzed via SSR molecular markers in order to provide a scientific basis for conservation and efficient utilization of P. armeniaca var. ansu germplasm resources, as well as to provide guidance for the breeding of superior varieties.

Plant materials
The plant materials consisted of 86 P. armeniaca var. ansu accessions, which were selected from 8 provenances in 2011. The detail geographic location information of 86 P. armeniaca var. ansu accessions was shown in S1 Table. These accessions were stored by asexual reproduction (grafting) in the National Forest Germplasm Resource Preservation Repository for Prunus species of Shenyang Agricultural University (Kazuo, Liaoning, China). Samples were collected in June 2020. The leaves of the one-year-old branches in the middle of the crown of the sample trees were collected. The collected leaves were first numbered and marked, wrapped in tinfoil, quickly frozen in liquid nitrogen, and stored at -80˚C.

DNA extraction
A genomic DNA extraction kit (Tiangen Biochemical Technology Co., Ltd., Beijing) was used to extract DNA. DNA quality was assessed via 1% agarose gel electrophoresis, and purity was tested by a NanoDrop 2000 spectrophotometer (NanoDrop, USA). The analyzed DNA samples were stored in a -20˚C refrigerator until further use.

SSR primer synthesis and PCR amplification
A total of 600 pairs of primers for P. armeniaca var. ansu were designed previously [31], 30 of which contained a high rate of polymorphism that were selected and synthesized by Beijing Saibaisheng Bioengineering Co., Ltd. (S2 Table). Amplification was carried out with a 20μl PCR reaction mixture of 20 ng template DNA, 0.125 μmol/L of each primer, 2.0 mmol/L Mg 2+ , 1.125 U Taq polymerase, and 0.45 mmol/L dNTPs. The PCR amplification reaction procedure was as follows: denaturation at 94˚C for 5 min, followed by 34 amplification cycles (denaturation at 94˚C for 30 s, annealing at 55˚C for 30 s, with annealing temperatures adjusted according to the primers used in S2 Table, and extension at 72˚C for 30 s), and a final extension at 72˚C for 5 min. After the PCR amplification products were obtained, non-denatured polyacrylamide gel electrophoresis was performed, and after fixation, dyeing, rinsing and imaging, the products were photographed and recorded in the gel imaging system (BIO-RAD, USA).

Statistical analysis
The gel image bands were analyzed by Image lab 4.0 software, and the data module of was used to determine uniform genotyping results. POPGENE version 1.32 was used to calculate the number of observed alleles (N A ), the number of effective alleles (N E ), observed heterozygosity (H O ), expected heterozygosity (H E ), percentage of polymorphic loci (PPL), Shannon's information index (I), Nei's gene diversity index (H), inbreeding coefficient (Fis), fixation index (Fit), genetic differentiation coefficient (Fst), gene flow (Nm), genetic distance and genetic similarity coefficient [32][33][34]. The Cervus version 3.0.7 was used to calculate polymorphic information content (PIC) [35]. The genetic similarity matrices between accessions were obtained using the SM similarity coefficient method in the NTSYS-pc 2.10e software. The clusters were then analyzed by the unweighted pair group method with arithmetic mean (UPGMA) to obtain a dendrogram [36]. STRUCTURE 2.3.4 was employed to analyze population structure based on a maximum likelihood mathematical model. The Bayesian clustering method in STRUCTURE was used to generate the genetic structure [37]. The calculations were carried out as described by [38], with default admixture and independent allele frequency models were utilized. K was set from 1 to 10, and each model run was repeated 10 times. The burn-in period was set to 100,000, followed by 100,000 MCMC iterations. The peak value of ΔK was used to determine the optimal K using STRUCTURE HARVESTER (http://taylor0. biology.ucla.edu/struct_harvest/) [39,40]. Analysis of molecular variance (AMOVA) was performed using GenAlex6.502 [41]. Chi-square tests were conducted using SPSS 22.0 [42].

Genetic diversity analysis of SSR markers in P. armeniaca var. ansu
The mean number of observed alleles from the 30 SSR markers was 11.433 and ranged from 3 to 23. The mean number of effective alleles was 4.433, and ranged from 1.151 to 12.016. The mean of Shannon information index was 1.670 and ranged from 0.285 to 2.773. The mean of the polymorphic information content was 0.670, and ranged from 0.125 (primer P3) to 0.912 ( Table 1). The tested SSR markers revealed a high level of polymorphism and genetic diversity in P. armeniaca var. ansu. The mean values of PIC, as well as observed and expected heterozygosity, were found to be 0.670, 0.295 and 0.696, respectively. The expected heterozygosity of the 29 SSR markers was higher than the observed heterozygosity, accounting for 96.67% of all SSR markers (Table 1). These results indicated that the heterozygosity in P. armeniaca var. ansu population was low.  (Table 2).

Molecular variance analysis within and among P. armeniaca var. ansu populations
AMOVA indicated that 83% of the genetic variation was found within P. armeniaca var. ansu populations, and only 17% of the variation occurred among P. armeniaca var. ansu populations ( Table 3).

Analysis of the genetic structure of P. armeniaca var. ansu
Subpopulations were divided according to the 'Hierarchical Island' model proposed by Evanno et al. (2005), in which the K value near the peak of ΔK was the closest to the actual number of subpopulations. When K was 3, ΔK was the largest (Fig 1). Therefore, the number of subpopulations of P. armeniaca var. ansu was determined to be 3 (K = 3). These results indicated that there were 3 subpopulations with different genetic structures. The drawing module of STRUCTURE 2.3 was used to create a bar graph of the Q value distribution under the optimal population structure (Fig 2). The Q matrix of P. armeniaca var. ansu (K = 3) was shown in S3 Table. P. armeniaca var. ansu accessions were divided into 3 subpopulations (S1, S2, and S3). The red bar graph in Fig 2 represented the first subpopulation (S1), which consists of 26 accessions from Xinjiang provenances, including Tuanjie Township (15), Qianjin Pasture (5), Alemale Township (4), and Huocheng (2). The green bar graph represented the second subpopulation (S2), which consists of 25 accessions from the northwest provenances, including Pengyang (3), Haiyuan (7), Zhenyuan (8), and Huining (7). The blue bar graph represented the third subpopulation (S3), including 35 P. armeniaca var. ansu accessions, and all of which were from Pengyang. Further analysis indicated that there was gene exchange among the three subpopulations. The genes of some accessions in the first subpopulation (S1) came from the second subpopulation (S2) and the third subpopulation (S3), while the genes of some accessions in the second subpopulation (S2) came from the third subpopulation (S3). Additionally, the genes of some accessions in the third subpopulation (S3) came from the first subpopulation (S1) and the second subpopulation (S2).

Analysis of genetic relationships of P. armeniaca var. ansu
The UPGMA cluster analysis of 86 P. armeniaca var. ansu accessions based on SSR markers was shown in

Correlations between subpopulations of P. armeniaca var. ansu based on mathematical modeling and groups based on UPGMA cluster analysis
Chi-square tests indicated that the correlations between 4 groups based on the UPGMA cluster analysis and 3 subpopulations based on mathematical modeling were highly significant (Sig<0.01) ( Table 4).  (Table 5). These results indicated that there was genetic differentiation within and among provenances. The genetic variation of P. armeniaca var. ansu occurred primarily, within provenances, with a small amount occurring between provenances. The Nei's genetic distance and genetic identity among provenances were shown in Table 4. Among the 8 provenances, the genetic distance between two provenances ranged from 0.180 to 1.204, with an average of 0.627. The genetic distance between Tuanjie Township and Alemale Township was the smallest (0.180), indicating a small degree of genetic differentiation. The genetic distance between Haiyuan and Huocheng was the largest (1.204), indicating a large degree of genetic differentiation. The genetic identity between two provenances ranged from 0.300 to 0.836, with an average of 0.558. The genetic identity between Tuanjie Township and Alemale Township was the largest (0.836), while the genetic identity between Haiyuan and Huocheng was the smallest (0.300). The genetic identity indices were negatively correlated with genetic distance indices (Table 6).

Discussion
Genetic diversity is the sum total of the genetic information carried by a species, which reflects the adaptability and evolutionary potential of populations in the environment [43]. Species with high genetic diversity can better adapt to environmental changes and are also more susceptible to environmental influences [44]. Studies focused on molecular-level genetic diversity typically employ Shannon's information index, as well as several other metrics to measure the genetic diversity of germplasm [45,46]. In our study, 30 SSR markers were used to analyze the genetic diversity of P. armeniaca var. ansu. Examination of these markers in the P. armeniaca var. ansu population revealed high levels of genetic diversity (He = 0.696). This level of diversity was higher than that of Prunus mume (He = 0.497) [47], and Prunus brigantina (He = 0.48) [48], but lower than that of Prunus sibirica (He = 0.774) [21] and Prunus armeniaca (He = 0.792) [13]. The level of genetic diversity may be related to molecular markers, samples, environmental conditions, and other factors. For example, P. armeniaca var. ansu showed a higher level of genetic diversity than Prunus armeniaca in Iran (He = 0.63) [49] and Tunisia (He = 0.56) [50], but a lower level of genetic diversity than Prunus armeniaca in Turkey (He = 0.72) [51], China (He = 0.774) [21], and Pakistan (He = 0.77) [52]. Plant growth is influenced by the genotype, environment and management factors. Asexual reproduction can maintain the excellent characteristics of the female parent. Different clones may show different phenotypes in the same environment, and the same clone may show different phenotypes in a different environment [53]. Among the 8 provenances examined in our study, Xinjiang province had the highest level of genetic diversity, with populations from Xinyuan County in Ili Kazakh Autonomous Prefecture showing the most diversity. This result was consistent with the conclusion that Ili was the center of origin for cultivated apricots [12]. Plant populations are not randomly distributed but are structured in space and time [54]. In this study, the P. armeniaca var. ansu population was divided into 3 subpopulations with a Bayesian model (Fig 1), which is mostly consistent with geographical distribution patterns among provenances. P. armeniaca var. ansu samples from provenances which were relatively close geographically were primarily found in the same subpopulation and had higher degrees of gene exchange (Fig 2). UPGMA clustering analysis of 86 P. armeniaca var. ansu accessions showed that those classified into the same subpopulation had close genetic relationships ( Fig  3). Overall, these findings suggested that the population genetic variation in P. armeniaca var. ansu is significantly impacted by geographical distribution. To further elucidate the relationships among the populations, we analyzed the genetic distance and genetic identity between provenances. This analysis revealed that genetic identity of P. armeniaca var. ansu between provenances was negatively correlated with geographical distance, implying that P. armeniaca var. ansu may have undergone a pattern of isolation by dispersal limitation [55]. This phenomenon is generally consistent with isolation by distance (IBD) [56], which has also been reported in P. armeniaca [23].
Genetic structure results from the joint action of mutation, selection, migration, and drift [54], and causes changes in allele frequency that result in genetic differentiation [43]. Assessment of genetic differentiation revealed that the variation in P. armeniaca var. ansu mainly existed within populations (83%), which is similar to earlier results seen in Prunus sibirica [57], P. armeniaca [58] and most tree species [43]. Nevertheless, the genetic differentiation among P. armeniaca var. ansu geographic groups showed a high level of genetic differentiation (Fst = 0.255> 0.015) [59], which suggests a relationship between environment and genetic differentiation. The result of F-statistics indicated a degree of inbreeding in the mating system of P. armeniaca var. ansu (0<Fit<1) [43], and the heterozygosity of the population was low (Fis = 0.328). All samples in this study were from Central Asia, where apricot is generally self-sterile [60]. Therefore, we speculate that the low heterozygosity found in the P. armeniaca var. ansu was likely caused by mating among close relatives rather than self-pollination. Considering the low gene flow in the P. armeniaca var. ansu population (Nm = 0.731<1) [43], migration may had little effect on its genetic differentiation.
In the forest ecosystem, the extinction of a tree species can produce a chain reaction, which can lead to the extinction of some local appendage species [61]. Our results show that there was a degree of inbreeding in P. armeniaca var. ansu. With the intensification of inbreeding depression, forest productivity and population survivability will decline, which may lead to the extinction of this species, so it is necessary to preserve P. armeniaca var. ansu genetic resources. Considering the decentralized and wide distribution of P. armeniaca var. ansu, according to the existing genetic diversity and genetic structure of its populations, we propose a conservation strategy that combining in situ protection and ex situ protection. Measures that can be taken for in situ protection include establishing nature reserves, forest reserves, prohibiting grazing, controlling the utilization degree of wild resources, and encouraging vigorous promotion of P. armeniaca var. ansu resources in suitable areas. However, it is difficult to implement in situ protection for all species, so ex-situ protection should be given more attention. In addition, increasing the tending management of stands could also contribute to the protection of P. armeniaca var. ansu resources [43]. The main objective in genetic resource conservation programs should be to maintain the highest possible level of genetic variability [62]. We have established a National Forest Germplasm Resource Preservation Repository for Prunus species, which requires seeds and scions from each population for ex-situ protection. We will also collect germplasm resources in greater breadth and depth in the future. Based on the results of genetic diversity of P. armeniaca var. ansu from different provenances, the resources of Tuanjie Township, Xinyuan County, Xinjiang should be protected first.

Conclusion
The P. armeniaca var. ansu population had a high level of genetic diversity, with those from Tuanjie Township and Xinyuan County being the most diverse. The level of genetic diversity among provenances was higher than diversity within provenances, and there was genetic differentiation within and among provenances. The genetic variation of P. armeniaca var. ansu mainly occurred within provenances, with a small degree present between them. The genetic relationship between Tuanjie Township, Xinyuan County and Alemale Township, Xinyuan County was the closest, and the degree of genetic differentiation was the smallest. Provenances played an important role in the classification of groups, while geographical distance was closely related to genetic difference. These results highlight the importance of accounting for provenances in future breeding efforts. Taken together, the results of our study provide a new scientific basis for conservation, efficient utilization and breeding of P. armeniaca var. ansu germplasm.
Supporting information S1